NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

BLQ: Light-Weight Locality-Aware Runtime for Blocking-Less Queuing

https://doi.org/10.1145/3640537.3641568

Wu, Qinzhe; Li, Ruihao; Beard, Jonathan; John, Lizy (February 2024, ACM)

Full Text Available
SPAMeR: Speculative Push for Anticipated Message Requests in Multi-Core Systems

https://doi.org/10.1145/3545008.3545044

Wu, Qinzhe; Ekanayake, Ashen; Li, Ruihao; Beard, Jonathan; John, Lizy (January 2023, International Conference on Parallel Processing)

Full Text Available
Online model swapping for architectural simulation

https://doi.org/10.1145/3457388.3458670

Lavin, Patrick; Young, Jeffrey; Vuduc, Richard; Beard, Jonathan (May 2021, The 18th International Conference on Computing Frontiers)
null (Ed.)
As systems and applications grow more complex, detailed computer architecture simulation takes an ever increasing amount of time. Longer simulation times result in slower design iterations which then force architects to use simpler models, such as spreadsheets, when they want to iterate quickly on a design. Simple models are not easy to work with though, as architects must rely on intuition to choose representative models, and the path from the simple models to a detailed hardware simulation is not always clear. In this work, we present a method of bridging the gap between simple and detailed simulation by monitoring simulation behavior online and automatically swapping out detailed models with simpler statistical approximations. We demonstrate the potential of our methodology by implementing it in the open-source simulator SVE-Cachesim to swap out the level one data cache (L1D) within a memory hierarchy. This proof of concept demonstrates that our technique can train simple models to match real program behavior in the L1D and can swap them in without destructive side-effects for the performance of downstream models. Our models introduce only 8% error in the overall cycle count, while being used for over 90% of the simulation and using models that require two to eight times less computation per cache access.
more » « less
Full Text Available
Virtual-Link: A Scalable Multi-Producer Multi-Consumer Message Queue Architecture for Cross-Core Communication

https://doi.org/10.1109/IPDPS49936.2021.00027

Wu, Qinzhe; Beard, Jonathan; Ekanayake, Ashen; Gerstlauer, Andreas; John, Lizy K. (May 2021, IEEE International Parallel and Distributed Processing Symposium (IPDPS))
null (Ed.)
Full Text Available
The Non-Uniform Compute Device (NUCD) Architecture for Lightweight Accelerator Offload

https://doi.org/10.1109/PDP50117.2020.00013

Asri, Mochamad; Dunham, Curtis; Rusitoru, Roxana; Gerstlauer, Andreas; Beard, Jonathan (March 2020, Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP))

Full Text Available
Multi-spectral Reuse Distance: Divining Spatial Information from Temporal Data

https://doi.org/10.1109/HPEC.2019.8916398

Cabrera, Anthony M.; Chamberlain, Roger D.; Beard, Jonathan C. (September 2019, Proceedings of IEEE High-Performance Extreme Computing Conference (HPEC))

The problem of efficiently feeding processing elements and finding ways to reduce data movement is pervasive in computing. Efficient modeling of both temporal and spatial locality of memory references is invaluable in identifying superfluous data movement in a given application. To this end, we present a new way to infer both spatial and temporal locality using reuse distance analysis. This is accomplished by performing reuse distance analysis at different data block granularities: specifically, 64B, 4KiB, and 2MiB sizes. This process of simultaneously observing reuse distance with multiple granularities is called multi-spectral reuse distance. This approach allows for a qualitative analysis of spatial locality, through observing the shifting of mass in an application's reuse signature at different granularities. Furthermore, the shift of mass is empirically measured by calculating the Earth Mover's Distance between reuse signatures of an application. From the characterization, it is possible to determine how spatially dense the memory references of an application are based on the degree to which the mass has shifted (or not shifted) and how close (or far) the Earth Mover's Distance is to zero as the data block granularity is increased. It is also possible to determine an appropriate page size from this information, and whether or not a given page is being fully utilized. From the applications profiled, it is observed that not all applications will benefit from having a larger page size. Additionally, larger data block granularities subsuming smaller ones suggest that larger pages will allow for more spatial locality exploitation, but examining the memory footprint will show whether those larger pages are fully utilized or not.
more » « less
Full Text Available

Search for: All records